Development of a speech recognition system for Icelandic using machine translated text
نویسندگان
چکیده
Text corpus size is an important issue when building a language model (LM). This is a particularly important issue for languages where little data is available. This paper introduces an LM adaptation technique to improve an LM built using a small amount of task dependent text with the help of a machine-translated text corpus. Icelandic word error rate experiments were performed using data, machine translated (MT) from English to Icelandic on a sentenceby-sentence and word-by-word basis. The baseline word error rate was 49.6%. LM interpolation using the baseline LM and an LM built from sentence-by-sentence translated text reduced the word error rate significantly to 41.9%.
منابع مشابه
Language Model Adaptation Using Machine-Translated Text for Resource-Deficient Languages
Text corpus size is an important issue when building a language model (LM). This is a particularly important issue for languages where little data is available. This paper introduces an LM adaptation technique to improve an LM built using a small amount of task-dependent text with the help of a machine-translated text corpus. Icelandic speech recognition experiments were performed using data, m...
متن کاملDevelopment of a WFST based Speech Recognition System for a Resource Deficient Language Using Machine Translation
Text corpus size is an important issue when building a language model (LM) in particular where insufficient training and evaluation data are available. In this paper we continue our work on creating a speech recognition system with a LM that is trained on a small amount of text in the target language. In order to get better performance we use a large amount of foreign text and a dictionary mapp...
متن کاملOff-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model
In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...
متن کاملBuilding an ASR Corpus Using Althingi's Parliamentary Speeches
Acoustic data acquisition for under-resourced languages is an important and challenging task. In the Icelandic parliament, Althingi, all performed speeches are transcribed manually and published as text on Althingi’s web page. To reduce the manual work involved, an automatic speech recognition system is being developed for Althingi. In this paper the development of a speech corpus suitable for ...
متن کاملAutomatic text dictation in computer-assisted translation
In this paper, we study the incorporation of statistical machine translation models to automatic speech recognition models in the framework of computer-assisted translation. The system is given a source language text to be translated and it shows the source text to the human translator to translate it orally. The system captures the user speech which is the dictation of the target language sent...
متن کامل